175 research outputs found

    Adaptive nowcasting of influenza outbreaks using Google searches

    Get PDF
    Seasonal influenza outbreaks and pandemics of new strains of the influenza virus affect humans around the globe. However, traditional systems for measuring the spread of flu infections deliver results with one or two weeks delay. Recent research suggests that data on queries made to the search engine Google can be used to address this problem, providing real-time estimates of levels of influenza-like illness in a population. Others have however argued that equally good estimates of current flu levels can be forecast using historic flu measurements. Here, we build dynamic ā€˜nowcastingā€™ models; in other words, forecasting models that estimate current levels of influenza, before the release of official data one week later. We find that when using Google Flu Trends data in combination with historic flu levels, the mean absolute error (MAE) of in-sample ā€˜nowcastsā€™ can be significantly reduced by 14.4%, compared with a baseline model that uses historic data on flu levels only. We further demonstrate that the MAE of out-of-sample nowcasts can also be significantly reduced by between 16.0% and 52.7%, depending on the length of the sliding training interval. We conclude that, using adaptive models, Google Flu Trends data can indeed be used to improve real-time influenza monitoring, even when official reports of flu infections are available with only one week's delay

    Using aircraft location data to estimate current economic activity

    Get PDF
    Aviation is a key sector of the economy, contributing at least 3% to gross domestic product (GDP) in the UK and the US. Currently, airline performance statistics are published with a three month delay. However, aircraft now broadcast their location in real-time using the Automated Dependent Surveillance Broadcast system (ADS-B). In this paper, we analyse a global dataset of flights since July 2016. We first show that it is possible to accurately estimate airline flight volumes using ADS-B data, which is available immediately. Next, we demonstrate that real-time knowledge of flight volumes can be a leading indicator for aviationā€™s direct contribution to GDP in both the UK and the US. Using ADS-B data could therefore help move us towards real-time estimates of GDP, which would equip policymakers with the information to respond to shocks more quickly

    Quantifying the relationship between financial news and the stock market

    Get PDF
    The complex behavior of financial markets emerges from decisions made by many traders. Here, we exploit a large corpus of daily print issues of the Financial Times from 2nd January 2007 until 31st December 2012 to quantify the relationship between decisions taken in financial markets and developments in financial news. We find a positive correlation between the daily number of mentions of a company in the Financial Times and the daily transaction volume of a company's stock both on the day before the news is released, and on the same day as the news is released. Our results provide quantitative support for the suggestion that movements in financial markets and movements in financial news are intrinsically interlinked

    Quantifying the digital traces of Hurricane Sandy on Flickr

    Get PDF
    Societyā€™s increasing interactions with technology are creating extensive ā€œdigital tracesā€ of our collective human behavior. These new data sources are fuelling the rapid development of the new field of computational social science. To investigate user attention to the Hurricane Sandy disaster in 2012, we analyze data from Flickr, a popular website for sharing personal photographs. In this case study, we find that the number of photos taken and subsequently uploaded to Flickr with titles, descriptions or tags related to Hurricane Sandy bears a striking correlation to the atmospheric pressure in the US state New Jersey during this period. Appropriate leverage of such information could be useful to policy makers and others charged with emergency crisis management

    The advantage of short paper titles

    Get PDF
    Vast numbers of scientific articles are published each year, some of which attract considerable attention, and some of which go almost unnoticed. Here, we investigate whether any of this variance can be explained by a simple metric of one aspect of the paper's presentation: the length of its title. Our analysis provides evidence that journals which publish papers with shorter titles receive more citations per paper. These results are consistent with the intriguing hypothesis that papers with shorter titles may be easier to understand, and hence attract more citations

    The advantage of simple paper abstracts

    Get PDF
    Each year, researchers publish an immense number of scientific papers. While some receive many citations, others receive none. Here we investigate whether any of this variance can be explained by the choice of words in a paper's abstract. We find that doubling the word frequency of an average abstract increases citations by 0.70%. We also find that journals which publish papers whose abstracts are shorter and contain more frequently used words receive slightly more citations per paper. Specifically, adding a 5 letter word to an abstract decreases the number of citations by 0.02%. These results are consistent with the hypothesis that the style in which a paper's abstract is written bears some relation to its scientific impact

    Quantifying crowd size with mobile phone and Twitter data

    Get PDF
    Being able to infer the number of people in a specific area is of extreme importance for the avoidance of crowd disasters and to facilitate emergency evacuations. Here, using a football stadium and an airport as case studies, we present evidence of a strong relationship between the number of people in restricted areas and activity recorded by mobile phone providers and the online service Twitter. Our findings suggest that data generated through our interactions with mobile phone networks and the Internet may allow us to gain valuable measurements of the current state of society

    Estimating suicide occurrence statistics using Google Trends

    Get PDF
    Data on the number of people who have committed suicide tends to be reported with a substantial time lag of around two years. We examine whether online activity measured by Google searches can help us improve estimates of the number of suicide occurrences in England before official figures are released. Specifically, we analyse how data on the number of Google searches for the terms ā€˜depressionā€™ and ā€˜suicideā€™ relate to the number of suicides between 2004 and 2013. We find that estimates drawing on Google data are significantly better than estimates using previous suicide data alone. We show that a greater number of searches for the term ā€˜depressionā€™ is related to fewer suicides, whereas a greater number of searches for the term ā€˜suicideā€™ is related to more suicides. Data on suicide related search behaviour can be used to improve current estimates of the number of suicide occurrences

    Measuring the size of a crowd using Instagram

    Get PDF
    Measuring the size of a crowd in a specific location can be of crucial importance for crowd management, in particular in emergency situations. Here, using two football stadiums as case studies, we present evidence that data generated through interactions with the social media platform Instagram can be used to generate estimates of the size of a crowd. We present a detailed analysis of the impact of varying the time period and spatial area considered for the collection of Instagram data. Crucially, we demonstrate how to address issues that arise from changes in the usage of a social media platform such as Instagram. Our findings show how social media datasets carrying location-based information may help provide near to real-time measurements of the size of a crowd

    Quantifying the link between art and property prices in urban neighbourhoods

    Get PDF
    Is there an association between art and changes in the economic conditions of urban neighbourhoods? While the popular media and policymakers commonly believe this to be the case, quantitative evidence remains lacking. Here, we use metadata of geotagged photographs uploaded to the popular image-sharing platform Flickr to quantify the presence of art in London neighbourhoods. We estimate the presence of art in neighbourhoods by determining the proportion of Flickr photographs which have the word ā€˜artā€™ attached. We compare this with the relative gain in residential property prices for each Inner London neighbourhood. We find that neighbourhoods which have a higher proportion of ā€˜artā€™ photographs also have greater relative gains in property prices. Our findings demonstrate how online data can be used to quantify aspects of the visual environment at scale and reveal new connections between the visual environment and crucial socio-economic measurements
    • ā€¦
    corecore